Semi-supervised learning integrated with classifier combination for word sense disambiguation

نویسندگان

  • Anh-Cuong Le
  • Akira Shimazu
  • Van-Nam Huynh
  • Minh Le Nguyen
چکیده

Word sense disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a certain context. This paper investigates the use of unlabeled data for WSD within a framework of semi-supervised learning, in which labeled data is iteratively extended from unlabeled data. Focusing on this approach, we first explicitly identify and analyze three problems inherently occurred piecemeal in the general bootstrapping algorithm; namely the imbalance of training data, the confidence of new labeled examples, and the final classifier generation; all of which will be considered integratedly within a common framework of bootstrapping. We then propose solutions for these problems with the help of classifier combination strategies. This results in several new variants of the general bootstrapping algorithm. Experiments conducted on the English lexical samples of Senseval-2 and Senseval-3 show that the proposed solutions are effective in comparison with previous studies, and significantly improve supervised WSD. 2007 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theme: A Study of Classifier Combination and Semi-Supervised Learning for Word Sense Disambiguation

1. Aims Word Sense Disambiguation (WSD) involves the association of a polysemous word in a text or discourse with a particular sense among numerous potential senses of that word. In my thesis, we present a study of classifier combination and semi-supervised learning for WSD, which aim to boost supervised WSD and improve accuracy of WSD. In addition, we also work on context representation and fe...

متن کامل

A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query

This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only g...

متن کامل

Investigating Problems of Semi-supervised Learning for Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the problem of determining the right sense of a polysemous word in a given context. In this paper, we will investigate the use of unlabeled data for WSD within the framework of semi supervised learning, in which the original labeled dataset is iteratively extended by exploiting unlabeled data. This paper addresses two problems occurring in this approach: deter...

متن کامل

Semi-Supervised Learning for Word Sense Disambiguation: Quality vs. Quantity

In this paper, we discuss the importance of the quality against the quantity of automatically extracted examples for word sense disambiguation (WSD). We first show that we can build a competitive WSD system with a memory-based classifier and a feature set reduced to easily and efficiently computable features. We then show that adding automatically annotated examples improves the performance of ...

متن کامل

A Bayesian Approach to Semi-Supervised Learning

Recent research in automated learning has focused on algorithms that learn from a combination of tagged and untagged data. Such algorithms can be referred to as semi-supervised in contrast to unsupervised, which refers to algorithms requiring no tagged data whatsoever. This paper presents a Bayesian approach to semi-supervised learning. In this approach, the parameters of a probability model ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2008